Introduction

This analysis based on Global Country Information Data set 2023 by Nidula Elgiriyewithana found in kaggle with public licence Attribution 4.0 International (CC BY 4.0). This comprehensive data set provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this data set offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

The Question?

Analyze the information about all countries worldwide then categorize them to top 10 and bottom 10. Use the information to get insight about correlation of different indicators and attributes through visualization.

1. Data Processing.

I have noticed that some values are not available for some countries but i decided to not drop countries with null values instead i will gather them in separate table during analysis

1.1. Data cleaning and verification:


1- Changed the name of Country “Sao Tome and Principe”. It was added with unreadable characters.

2- Changed the name of column “Density (P/Km2)” to remove the line break between “Density” and “(P/Km2).

The dataset summary:


## ── Data Summary ────────────────────────
##                            Values 
## Name                       dataset
## Number of rows             195    
## Number of columns          35     
## _______________________           
## Column type frequency:            
##   character                19     
##   numeric                  16     
## ________________________          
## Group variables            None

1.2. Creating Customized theme for Tables and output Messages.



Setting Customized output message in case of no missing data in the column


result_missing_data<- ggplot()+
  geom_rect(aes(xmin=0, xmax=2, ymin=0, ymax=0.5),
              color='darkgreen',
              fill = 'lightgreen', 
              alpha = 0.5) +
  geom_text(x=1,
              y=0.25,
              aes(label=c("Good News!!! No Missing Data")),
              size = 12,
              family="serif",
              color="darkgreen") +
  theme_void()

Here are the code chunk for Setting customized table theme.


tt1 <- ttheme_default(core=list( bg_params = list(fill = "#D9E1F2"[2:1], 
                                                  col="white",
                                                  alpha = 0.75),
                                 fg_params=list(col = "black",
                                                fontface="bold",
                                                fontsize=10,
                                                fontfamily="serif")),
                      colhead=list(bg_params = list(fill = "#4472C4", 
                                                    col = "white",
                                                    alpha = 0.75), 
                                   fg_params=list(col = "white",
                                                  fontface= "bold",
                                                  fontsize=14,
                                                  fontfamily="serif"))
                        )

3. Analysis and Visualization

In this phase i will conduct the following steps to gain insights for top 10 and bottom 10 countries based on population density in km2, number of population, birth rate correlation with fertility rate, and gross domestic product (GDP) :

1- I will check the dataset for missing information.
2- I will extract the lowest 10 and highest 10 countries based on the previously mentioned indicators and attributes.
3- A variety of visualization methods will be used to visualize the results.

3.1 Density (P/Km2)


The code in this step will check the density column for missing values. If there are countries with missing values, the result will be a table with those countries. If nothing missing, an output message will pop up confirming that nothing is missing.

The result of missing data checker is:


3.1.1 Top 10 countries with the highest Density(P/Km2).


The graph illustrated that Monaco has the highest density by 26,337 P/Km2.

3.1.2 Bottom 10 countries with the lowest Density(P/Km2).


The graph illustrated that Mongolia has the lowesdt density by only 2 P/Km2.

3.2 World Population


In this phase i will conduct the following steps to gain insights for top 10 and bottom 10 countries based on population:
1- I will check the dataset for missing population information.

2- I will extract the lowest 10 and highest 10 countries based on population.

3- A population map will be used to visualize the results highlighting the highest and lowest 10 countries based on their population.

The result of missing data checker is:


The population information of Palestine is missing from the data set.

3.2.1 Sorting countries by population and filtering highest and lowest 10


The tables shows that China has the highest population by 1.399 billions , meanwhile Vatican City has the lowest population by only 836.

3.2.2 World Population map, highlighing Top 10 and Bottom 10.


The graphs shows the top 10 countries with the highest and the bottom 10 with the lowest population over the world map.

3.3 Birth rate correlation with Fertility rate.


In this phase i will conduct the following steps to gain insights about the correlation between birth rate and fertility rate. Also, filtering the highest and lowest 10 countries based on both factors the result will be consolidated in two tables one for each factor:
1- I will check the data set for missing information.

2- I will extract the lowest 10 and highest 10 countries based on birth rate “and” fertility rate.

3- A line chart will be used to visualize the results highlighting the highest and lowest countries based on both factors.

The result of missing data checker is:


The table below shows the countries if either the fertility rate or the birth rate are missing from the dataset

3.3.1 Categorizing highest and lowest 10 by Birth rate and Fertility rate.


In this step the 10 highest and the 10 lowest countries by birth rate and fertility rate got filtered then merged both tables to get more in depth insights about the correlation between the two attributes. That is the reason why there are more than 10 countries in each table.

3.3.2 Correlation between Birth rate and Fertility rate columns highliting Top and Bottom Countires.


The graphs illustrated that Niger has the highest birth rate and fertility rate, meanwhile South Korea has the lowest. Also, there is direct and high correlation between fertility and birth rate.

3.4 Gross Domestic Product (GDP)


In this phase i will conduct the following steps to gain insights about the Gross Domestic Product (GDP). Also, filtering the highest and lowest 10 countries based on GDP:
1- I will check the dataset for missing information.

2- I will extract the lowest 10 and highest 10 countries based on GDP.

3- Two treemap charts will be used to visualize the results highlighting the highest and lowest countires based on GDP.

The result of missing data checker is:


The table shows the countries’ GDP which are missing from the dataset

3.4.1 Sorting countries by GDP and filtering highest and lowest 10


The table illustrated the top 10 countries with the highest GDP, and the 10 bottom with the lowest GDP in millions USD.

3.4.2 Treemaps visualizing the highest and lowest 10 countries based on GDP


The graphs illustrated that USA has the highest GDP 21,427,700 millions USD, meanwhile Tuvalu has the lowest GDP 47 millions USD

4. Hover to know more


In this part i use world map with leaflet option to create an interactive world map. The map has some information about the countries such as Official name, Population, Density, land area, and GDP. All you have to do is to hover over the country. The map has information for only the 195 countries included in the dataset.

The End

Thank you